Picture for Zhenbo Luo

Zhenbo Luo

Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation

Add code
Feb 03, 2026
Viaarxiv icon

Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models

Add code
Feb 02, 2026
Viaarxiv icon

GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models

Add code
Jan 26, 2026
Viaarxiv icon

Federated Balanced Learning

Add code
Jan 20, 2026
Viaarxiv icon

Federated Joint Learning for Domain and Class Generalization

Add code
Jan 18, 2026
Viaarxiv icon

Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding

Add code
Jan 16, 2026
Viaarxiv icon

Xiaomi MiMo-VL-Miloco Technical Report

Add code
Dec 22, 2025
Figure 1 for Xiaomi MiMo-VL-Miloco Technical Report
Figure 2 for Xiaomi MiMo-VL-Miloco Technical Report
Figure 3 for Xiaomi MiMo-VL-Miloco Technical Report
Figure 4 for Xiaomi MiMo-VL-Miloco Technical Report
Viaarxiv icon

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

Add code
Nov 17, 2025
Viaarxiv icon

HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration

Add code
Oct 31, 2025
Viaarxiv icon

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Add code
Sep 19, 2025
Viaarxiv icon